2 research outputs found
Towards Standardising Reinforcement Learning Approaches for Production Scheduling Problems
Recent years have seen a rise in interest in terms of using machine learning,
particularly reinforcement learning (RL), for production scheduling problems of
varying degrees of complexity. The general approach is to break down the
scheduling problem into a Markov Decision Process (MDP), whereupon a simulation
implementing the MDP is used to train an RL agent. Since existing studies rely
on (sometimes) complex simulations for which the code is unavailable, the
experiments presented are hard, or, in the case of stochastic environments,
impossible to reproduce accurately. Furthermore, there is a vast array of RL
designs to choose from. To make RL methods widely applicable in production
scheduling and work out their strength for the industry, the standardisation of
model descriptions - both production setup and RL design - and validation
scheme are a prerequisite. Our contribution is threefold: First, we standardize
the description of production setups used in RL studies based on established
nomenclature. Secondly, we classify RL design choices from existing
publications. Lastly, we propose recommendations for a validation scheme
focusing on reproducibility and sufficient benchmarking
Sheet-Metal Production Scheduling Using AlphaGo Zero
This work investigates the applicability of a reinforcement learning (RL) approach, specifically AlphaGo Zero (AZ), for optimizing sheet-metal (SM) production schedules with respect to tardiness and material waste. SM production scheduling is a complex job shop scheduling problem (JSSP) with dynamic operation times, routing flexibility and supplementary constraints. SM production systems are capable of processing a large number of highly heterogeneous jobs simultaneously. While very large relative to the JSSP literature, the SM-JSSP instances investigated in this work are small relative to the SM production reality. Given the high dimensionality of the SM-JSSP, computation of an optimal schedule is not tractable. Simple heuristic solutions often deliver bad results. We use AZ to selectively search the solution space. To this end, a single player AZ version is pretrained using supervised learning on schedules generated by a heuristic, fine-tuned using RL and evaluated through comparison with a heuristic baseline and Monte Carlo Tree Search. It will be shown that AZ outperforms the other approaches. The work’s scientific contribution is twofold: On the one hand, a novel scheduling problem is formalized such that it can be tackled using RL approaches. On the other hand, it is proved that AZ can be successfully modified to provide a solution for the problem at hand, whereby a new line of research into real-world applications of AZ is opened